21 research outputs found
Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech
Tackling online hatred using informed textual responses - called counter
narratives - has been brought under the spotlight recently. Accordingly, a
research line has emerged to automatically generate counter narratives in order
to facilitate the direct intervention in the hate discussion and to prevent
hate content from further spreading. Still, current neural approaches tend to
produce generic/repetitive responses and lack grounded and up-to-date evidence
such as facts, statistics, or examples. Moreover, these models can create
plausible but not necessarily true arguments. In this paper we present the
first complete knowledge-bound counter narrative generation pipeline, grounded
in an external knowledge repository that can provide more informative content
to fight online hatred. Together with our approach, we present a series of
experiments that show its feasibility to produce suitable and informative
counter narratives in in-domain and cross-domain settings.Comment: To appear in "Proceedings of the 59th Annual Meeting of the
Association for Computational Linguistics (ACL): Findings
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Recently research has started focusing on avoiding undesired effects that
come with content moderation, such as censorship and overblocking, when dealing
with hatred online. The core idea is to directly intervene in the discussion
with textual responses that are meant to counter the hate content and prevent
it from further spreading. Accordingly, automation strategies, such as natural
language generation, are beginning to be investigated. Still, they suffer from
the lack of sufficient amount of quality data and tend to produce
generic/repetitive responses. Being aware of the aforementioned limitations, we
present a study on how to collect responses to hate effectively, employing
large scale unsupervised language models such as GPT-2 for the generation of
silver data, and the best annotation strategies/neural architectures that can
be used for data filtering before expert validation/post-editing.Comment: To appear at ACL 2020 (long paper
Scent Mining: Extracting Olfactory Events, Smell Sources and Qualities
Olfaction is a rather understudied sense compared to the other senses. In NLP, however, there have been recent attempts to develop taxonomies and benchmarks specifically designed to capture smell-related information. In this work, we further extend this research line by presenting a supervised system for olfactory information extraction in English. We cast this problem as a token classification task and build a system that identifies smell words, smell sources and qualities. The classifier is then applied to a set of English historical corpora, covering different domains and written in a time period between the 15th and the 20th Century. A qualitative analysis of the extracted data shows that they can be used to infer interesting information about smelly items such as tea and tobacco from a diachronical perspective, supporting historical investigation with corpus-based evidence
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
In this work, we present an extensive study on the use of pre-trained
language models for the task of automatic Counter Narrative (CN) generation to
fight online hate speech in English. We first present a comparative study to
determine whether there is a particular Language Model (or class of LMs) and a
particular decoding mechanism that are the most appropriate to generate CNs.
Findings show that autoregressive models combined with stochastic decodings are
the most promising. We then investigate how an LM performs in generating a CN
with regard to an unseen target of hate. We find out that a key element for
successful `out of target' experiments is not an overall similarity with the
training data but the presence of a specific subset of training data, i.e. a
target that shares some commonalities with the test target that can be defined
a-priori. We finally introduce the idea of a pipeline based on the addition of
an automatic post-editing step to refine generated CNs.Comment: To appear in "Proceedings of the 60th Annual Meeting of the
Association for Computational Linguistics (ACL): Findings
Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering
Fighting online hate speech is a challenge that is usually addressed using
Natural Language Processing via automatic detection and removal of hate
content. Besides this approach, counter narratives have emerged as an effective
tool employed by NGOs to respond to online hate on social media platforms. For
this reason, Natural Language Generation is currently being studied as a way to
automatize counter narrative writing. However, the existing resources necessary
to train NLG models are limited to 2-turn interactions (a hate speech and a
counter narrative as response), while in real life, interactions can consist of
multiple turns. In this paper, we present a hybrid approach for dialogical data
collection, which combines the intervention of human expert annotators over
machine generated dialogues obtained using 19 different configurations. The
result of this work is DIALOCONAN, the first dataset comprising over 3000
fictitious multi-turn dialogues between a hater and an NGO operator, covering 6
targets of hate.Comment: To appear in Proceedings of the 2022 Conference on Empirical Methods
in Natural Language Processing (long paper
Toward Stance-based Personas for Opinionated Dialogues
In the context of chit-chat dialogues it has been shown that endowing systems
with a persona profile is important to produce more coherent and meaningful
conversations. Still, the representation of such personas has thus far been
limited to a fact-based representation (e.g. "I have two cats."). We argue that
these representations remain superficial w.r.t. the complexity of human
personality. In this work, we propose to make a step forward and investigate
stance-based persona, trying to grasp more profound characteristics, such as
opinions, values, and beliefs to drive language generation. To this end, we
introduce a novel dataset allowing to explore different stance-based persona
representations and their impact on claim generation, showing that they are
able to grasp abstract and profound aspects of the author persona.Comment: Accepted at Findings of EMNLP 202
Building a Multilingual Taxonomy of Olfactory Terms with Timestamps
Olfactory references play a crucial role in our memory and, more generally, in our experiences, since researchers have shown that smell is the sense that is most directly connected with emotions. Nevertheless, only few works in NLP have tried to capture this sensory dimension from a computational perspective. One of the main challenges is the lack of a systematic and consistent taxonomy of olfactory information, where concepts are organised also in a multi-lingual perspective. WordNet represents a valuable starting point in this direction, which can be semi-automatically extended taking advantage of Google n-grams and of existing language models. In this work we describe the process that has led to the semi-automatic development of a taxonomy for olfactory information in four languages (English, French, German and Italian), detailing the different steps and the intermediate evaluations. Along with being multi-lingual, the taxonomy also encloses temporal marks for olfactory terms thus making it a valuable resource for historical content analysis. The resource has been released and is freely available
Empowering NGOs in Countering Online Hate Messages
Studies on online hate speech have mostly focused on the automated detection
of harmful messages. Little attention has been devoted so far to the
development of effective strategies to fight hate speech, in particular through
the creation of counter-messages. While existing manual scrutiny and
intervention strategies are time-consuming and not scalable, advances in
natural language processing have the potential to provide a systematic approach
to hatred management. In this paper, we introduce a novel ICT platform that NGO
operators can use to monitor and analyze social media data, along with a
counter-narrative suggestion tool. Our platform aims at increasing the
efficiency and effectiveness of operators' activities against islamophobia. We
test the platform with more than one hundred NGO operators in three countries
through qualitative and quantitative evaluation. Results show that NGOs favor
the platform solution with the suggestion tool, and that the time required to
produce counter-narratives significantly decreases.Comment: Preprint of the paper published in Online Social Networks and Media
Journal (OSNEM